Coordinating idle state transitions in multi-core processors

ABSTRACT

Systems and methods of managing processors provide for detecting a command at a core of a processor having a plurality of cores, where the command requests a transition of the core to an idle state. Power consumption of the core is managed based on the command and an idle state status of each of the plurality of cores.

BACKGROUND

1. Technical Field

One or more embodiments of the present invention generally relate topower management. In particular, certain embodiments relate to managingpower consumption in multi-core processors.

2. Discussion

As the trend toward advanced central processing units (CPUs) with moretransistors and higher frequencies continues to grow, computer designersand manufacturers are often faced with corresponding increases in powerand energy consumption. Furthermore, manufacturing technologies thatprovide faster and smaller components can at the same time result inincreased leakage power. Particularly in mobile computing environments,increased power consumption can lead to overheating, which maynegatively affect performance, and can significantly reduce batterylife. Because batteries typically have a limited capacity, running theprocessor of a mobile computing system more than necessary could drainthe capacity more quickly than desired.

Some modern mobile computing systems attempt to conserve power byplacing the processor in various power/idle states when there are noinstructions to be executed. It should be noted, however, that thesesolutions are typically tailored for single core processors. As aresult, traditional approaches only need to consider the status of asingle core when managing power and making power state transitiondeterminations. In addition, it is common for power management to beimplemented at the operating system (OS) level, which may be too slow asprocessor architectures become more complex.

BRIEF DESCRIPTION OF THE DRAWINGS

The various advantages of the embodiments of the present invention willbecome apparent to one skilled in the art by reading the followingspecification and appended claims, and by referencing the followingdrawings, in which:

FIG. 1 is a block diagram of an example of a multi-core processoraccording to one embodiment of the invention;

FIG. 2 is a block diagram of an example of a computing system accordingto one embodiment of the invention;

FIG. 3 is a flowchart of an example of a method of managing core idlepower according to one embodiment of the invention;

FIG. 4 is a flowchart of an example of a process of managing powerconsumption according to one embodiment of the invention;

FIG. 5A is a flowchart of an example of a process of initiatingdedicated power saving features according to one embodiment of theinvention;

FIG. 5B is a flowchart of an example of a process of initiation sharedpower saving features according to one embodiment of the invention;

FIG. 6A is a flowchart of an example of a process of detecting a commandaccording to one embodiment of the invention;

FIG. 6B is a flowchart of an example of a process of detecting a commandaccording to an alternative embodiment of the invention; and

FIG. 7 is a state diagram of an example of a power management statemachine according to one embodiment of the invention.

DETAILED DESCRIPTION

FIG. 1 shows a processor 10 having a plurality of cores 12 (12 a-12 b),where each core 12 is fully functional with instruction fetch units,instruction decoders, level one (L1) cache 14 (14 a-14 b), executionunits, and so on. While the illustrated processor 10 has two cores 12,the embodiments of the invention are not so limited. Indeed, thetechniques described herein can be useful for any multi-corearchitecture for which power consumption is an issue of concern. Thus,any number of cores may be used without parting from the spirit andscope of the embodiments described herein.

Each core 12 is able to detect a command that requests a transition ofthe core 12 to an idle state. The command may originate internallywithin the core 12 or external to the core 12. The idle state could be aprocessor power state such as one of the “C-states” described in theAdvanced Configuration Power Interface (ACPI, Ver. x285, June 2004)Specification. Generally, deeper idle states are associated with lowerpower consumption and longer exit latency. The following tabledemonstrates one approach to specifying C-state latencies. Otherapproaches may also be used.

TABLE I Byte Byte Field Length Offset Description P_LVL2_LAT 2 96 Theworst-case hardware latency, in microseconds, to enter and exit a C2state. A value >100 indicates the system does not support a C2 state.P_LVL3_LAT 2 98 The worst-case hardware latency, in microseconds, toenter and exit a C3 state. A value >1000 indicates the system does notsupport a C3 state.

The processor 10 can have a level two (L2) cache 20 that is shared bythe cores 12. The L1 caches 14, on the other hand, may be dedicated totheir respective cores 12. As will be discussed in greater detail below,the dedicated nature of the L1 caches 14 provides an opportunity forper-core power management. The cores 12 also have dedicated clock inputs15 (15 a-15 b) that can be gated to obtain power savings on a per corebasis. Hardware coordination logic 16 can manage power consumption of agiven core 12 based on the command and an idle state status 18 (18 a-18b) of each of the plurality of cores 12. By coordinating multiple cores12 and multiple idle state statuses 18, the illustrated processor 10 isable to support more complex architectures and can respond more quicklyto condition changes than traditional software approaches. Theillustrated processor 10 can also use the coordination logic 16 toinitiate power saving features in advance of actual power statetransitions. The result can be significant power savings.

For example, it might be determined that, based on a lack ofutilization, the C4 state is appropriate for the first core 12 a. The C4state, which is deep in relation to the other C-states, is typicallyassociated with a shared resource such as a package-wide voltage and/orfrequency setting. The second core 12 b, on the other hand, may be in anactive state. Under such conditions, the coordination logic 16 couldtransition the first core to a “tentative” state that involves theinitiation of certain dedicated power saving features so that the firstcore 12 a is still able to conserve power. Furthermore, if the secondcore 12 b subsequently receives a request to transition to the C4 state,the coordination logic 16 can also initiate shared power saving featuresto conserve more power while the cores 12 are being transitioned intothe C4 state. Similar advantages can be achieved for the other idlestates by detecting when all cores are transitioning to the same state.

FIG. 2 shows a system 22 having a processor 10′ with a plurality ofcores 12′ (12 a′-12 b′) and hardware coordination logic 16′ as alreadydescribed. The illustrated system 22 also includes one or moreinput/output (I/O) devices 24, a random access memory (RAM) 26 and aread only memory (ROM) 28 coupled to the processor 10′ by way of achipset 30. The RAM 26 and ROM 28 store instructions 32 that can beexecuted as one or more threads and/or processes by the cores 12′, whereexecution of the instructions 32 can lead to increased powerconsumption. As idle state transition commands are received by the cores12′ from the chipset 30 and/or operating system(OS), the hardwarecoordination logic 16′ is able to substantially reduce power consumptionfor the system 22.

Turning now to FIG. 3, a method 34 of managing core idle power is shown.The method 34 may be implemented using any combination of hardwareand/or software programming techniques. For example, the method 34 maybe implemented in a reduced instruction set computer (RISC) multi-coreprocessor as fixed functionality hardware, microcode, or any combinationthereof. In particular, processing block 36 provides for detecting acommand at a core of a processor having a plurality of cores. Thecommand can request a transition of the core to an idle state. Powerconsumption of the core is managed at block 38 based on the command andan idle state status of each of the plurality of cores. Thus, the statusof one core can be taken into consideration when managing the idle powerof another core.

FIG. 4 shows one approach to managing power consumption in greaterdetail at block 38′. The illustrated block 40 provides for initiating adedicated power saving feature in the core before transitioning the coreto the idle state. Such an approach enables the core to enter a statethat is equivalent to the idle state and may enable the core to achievepower savings above and beyond those provided by the idle state itself.For example, if the requested idle state is generally associated with agating of a dedicated clock, block 40 can incorporate such a feature.Block 42 provides for determining whether each of the plurality of coresis ready to enter an idle state (i.e., none of the plurality of cores isactive). If so, block 43 provides for determining whether each of theplurality of cores has detected a command requesting a transition to acommon (i.e., the same) idle state.

If all of the cores are not transitioning to the same idle state, theshallowest state among the plurality of cores is selected as the idlestate at block 52. Thus, if the first core is in a C2 equivalent state(i.e., “CC2” state) and the second core is in a C3 equivalent state(i.e., “CC3” state), the shallowest state would be the C2/CC2 equivalentstate. The chipset therefore experiences a unified interface to theprocessor although the processor may be experiencing multiple differentidle states internally. Such an approach represents a significantdeparture from conventional single core and multi-processorarchitectures. Once the appropriate idle state has been identified, ashared power saving feature is initiated at block 44. It should be notedthat transitioning to the idle state typically involves gating theclocks and halting execution. The power saving features initiated atblocks 40 and 44, however, are implemented while clocks are availableand the core(s) are still running. This technique can providesubstantial advantages over conventional approaches.

External break events such as interrupts, exceptions and monitor eventsare prevented from reaching the plurality of cores at block 46, whilethe shared state entry procedure is in progress. Break events can beinhibited in a variety of ways. For example, one approach would be toprovide for a special interface into each of the cores' break logic.Another approach would be to physically separate the cores from allbreak sources. If a break event is detected after the shared state isreached, the shared state is exited. Such an exit can be achieved in anumber of ways. For example, the chipset could detect the break eventand/or initiate the exit sequence or logic could be provided within theprocessor to detect the break event and/or initiate the exit sequence.When the multi-core processor exits the idle state, inhibiting ofexternal break events can be discontinued. Block 48 provides fortransitioning the plurality of cores to the idle state. Transitioningthe cores to the idle state can involve issuing a signal such as a readtransaction, specialized bus message or sideband signal to the chipset.For example, one approach is to initiate a well documented handshakesequence with the chipset in which sleep (i.e., SLP), deep sleep (i.e.,DPSLP) and deeper sleep (i.e., DPRSLP) state signals are transferredbetween the processor and the chipset.

If it is determined at block 42 that one or more of the plurality ofcores is active, block 56 provides for determining whether the idlestate is associated with a resource that is shared by the plurality ofcores. As already noted, the shared resource might be a frequency and/orcore voltage setting. An example of such a state could be the C4 state.If the idle state is associated with a shared resource, the core istransitioned to a tentative state at block 58 until each of theplurality of cores has detected a command requesting a transition to theidle state. Otherwise, the core can be transitioned to the requestedstate at block 57. Block 50 provides for halting execution of the core.

Turning now to FIGS. 5A and 5B, approaches to initiating dedicated andshared power saving features are shown in greater detail at blocks 40′and 44′, respectively. In particular, the L1 cache is flushed into theL2 cache at block 60 and the L1 cache is placed in a low-powernon-snoopable state at block 62. If the flushed data is not already inthe L2 cache (which handles snoops while the L1 cache is in thenon-snoopable state) the data can be further flushed to system memory.The L1 flushing feature may be used for the C3 and CC3 state. Block 64provides for gating a dedicated clock of the core.

If all cores are ready to enter an idle state, block 66 provides forreducing a performance state of the processor. Performance statestypically involve the adjustment of shared resource settings such ascore voltage and/or frequency. The following table demonstrates oneexample of multiple performance state settings that can be used for aprocessor core.

TABLE II P-state Frequency Voltage P0 1.6 GHz 1.484 V P1 1.4 GHz 1.420 VP2 1.2 GHz 1.276 V P3 1.0 GHz 1.164 V P4 800 MHz 1.036 V P5 600 MHz0.956 V

An execution context of the processor can be saved at block 70 and ashared phase locked loop (PLL) can be shutdown at block 68. In theillustrated approach, the PLL shutdown can be conducted after thechipset handshake sequence has been completed. As already noted, byinitiating advanced power saving features such as these while the coreis still able to execute instructions, the illustrated approach providessignificant advantages over conventional techniques.

FIG. 6A shows one approach to detecting an idle state transition commandin greater detail at block 72. Thus, block 72 can be readily substitutedfor block 36 (FIG. 3) discussed above. In particular, the illustratedblock 76 provides for detecting a first command that identifies anaddress. One such command might be a MONITOR command. A second commandis detected at block 76, where the second command instructs the core towait in an idle state until the address is encountered. One such commandmight be an MWAIT(Cx) command, where “x” signifies the target idlestate. The MWAIT approach could be implemented in a processor driverthat is optimized to support multi-core operation.

FIG. 6B shows an alternative approach to detecting an idle statetransition command in greater detail at block 72′. Thus, block 72′ canbe readily substituted for block 36 discussed above. In particular, theillustrated block 78 provides for receiving an I/O read transaction thatidentifies the idle state. One such transaction might be a Levelx_Rdtransaction, where “x” signifies the target idle state. This type ofcommand could be issued by the chipset and/or OS. Block 80 provides fortranslating the I/O read transaction into a second command thatinstructs the core to wait in the idle state until an address isencountered. Thus, the I/O read transaction could be translated into anMWAIT command.

Turning now to FIG. 7, a specific example of a multi-core state machine82 is shown. State machine 82 will be described in reference to a targetstate of the C4 state for the purposes of discussion. Consider, forexample, a case in which both cores are in the active state C0, which isillustrated as states 84 and 86. If the first core (i.e., core_0)receives an MWAIT(C4) command (or I/O read transaction), the first corewill be placed in a tentative state 90 at arrow 88. The tentative state90 is illustrated as “CC3(C4)”. The first core will initiate variousdedicated power saving features such as flushing the L1 cache and gatingthe dedicated clock of the first core. If an interrupt or the specifiedMONITOR address is encountered, the first core will “break” to theactive state 84 at arrow 92. While the first core is in the tentativestate 90, the hardware coordination logic will monitor the second core(i.e., core_1) to detect when the second core is ready to transition tothe C4 state. If the second core receives a request to transition to theC4 state while the first core is in the tentative state 90, the secondcore will transition to the tentative state 94 at arrow 96.

The hardware coordination logic will then determine that both cores havedetected a command requesting a transition to the C4 state, and mayinitiate more advanced power saving features such as a performance statereduction, a shutdown of a shared PLL or a saving of an executioncontext of the processor. The coordination logic can also preventexternal break events from reaching the cores at state 98. Once externalbreak events have been inhibited, the coordination logic can transitionboth cores to the C4 state. In particular, an I/O read transaction canbe issued to the chipset at arrow 100, where the cores await completionnotification in state 102. Upon receipt of the chipset acknowledgment(e.g., STPCLK pin assertion) and the I/O-cycle completion notification,the coordination logic issues a stop grant signal to the chipset atarrow 104 and waits in the Stop_GNT state 106. The entire processor isthen sequenced through the sleep (i.e., SLP), deep sleep (i.e., DPSLP)and deeper sleep (i.e., DPRSLP) states, where the deep sleep state andthe deeper sleep states correspond to the traditional C3 and C4 states,respectively.

Thus, a number of advantages can be achieved through the varioustechniques described herein. For example, enabling software to initiatedifferent idle state commands per core provides maximum flexibility andpower savings. Furthermore, by internally analyzing target idle stateson a per core basis (versus external-only sequencing), advanced powermanagement activities can be initiated while clocks are available andthe core(s) are still running. It should also be noted that independentidle states can be established for each core while presenting a common“shallowest” state to the chipset and other system components. Theresult is a highly scalable, yet sophisticated solution. Simply put,hardware coordination of idle states in a multi-core environment asdiscussed herein can provide substantial benefits over conventionalarchitectures and/or techniques.

Those skilled in the art can appreciate from the foregoing descriptionthat the broad techniques of the embodiments of the present inventioncan be implemented in a variety of forms. Therefore, while theembodiments of this invention have been described in connection withparticular examples thereof, the true scope of the embodiments of theinvention should not be so limited since other modifications will becomeapparent to the skilled practitioner upon a study of the drawings,specification, and following claims.

1. A method comprising: detecting a command at a core of a processorhaving a plurality of cores, the command requesting a transition of thecore to an idle state; and managing power consumption of the core basedon the command and an idle state status of each of the plurality ofcores; wherein managing power consumption of the core includes:determining whether each of the plurality of cores has detected acommand requesting a transition to a common state, and selecting ashallowest power conservation state among the plurality of cores as theidle state if each of the plurality of cores has not detected a commandrequesting a transition to the common state.
 2. The method of claim 1,further including: initiating a dedicated power saving feature in thecore before transitioning the core to the idle state; and haltingexecution of the core.
 3. The method of claim 2, wherein managing thepower consumption further includes: determining that none of theplurality of cores is active initiating a shared power saving featurebefore the execution is halted; preventing external break events fromreaching the plurality of cores; and transitioning the plurality ofcores to the idle state.
 4. The method of claim 3, wherein transitioningthe plurality of cores to the idle state includes issuing a signal to achipset, the signal being selected from a group comprising a readtransaction, a bus message and a sideband signal.
 5. The method of claim3, wherein initiating the shared power saving feature includesinitiating a process selected from a group comprising reducing aperformance state of the processor, shutting down a shared phase lockedloop and saving an execution context of the processor.
 6. The method ofclaim 2, wherein managing the power consumption further includes:determining that one or more of the plurality of cores is active; anddetermining whether the idle state is associated with a resource that isshared by the plurality of cores.
 7. The method of claim 6, whereinmanaging the power consumption further includes: transitioning the coreto a tentative state until each of the plurality of cores has detected acommand requesting a transition to the idle state if the idle state isassociated with a resource that is shared by the plurality of cores;transitioning the core to the idle state if the idle state is notassociated with a resource that is shared by the plurality of cores. 8.The method of claim 2, wherein initiating the dedicated power savingfeature includes: flushing a level one cache of the core into a leveltwo cache of the processor; placing the level one cache in anon-snoopable state; and gating a dedicated clock of the core.
 9. Themethod of claim 1, wherein the detecting includes: detecting a firstcommand that identifies an address; and detecting a second command thatinstructs the core to wait in the idle state until the address isencountered.
 10. The method of claim 1, wherein the detecting includes:receiving an input/output (I/O) read transaction that identifies theidle state; and translating the I/O read transaction into a secondcommand that instructs the core to wait in the idle state until anaddress is encountered.
 11. An apparatus comprising: a processor havinga plurality of cores, the plurality of cores including a core to detecta command that requests a transition of the core to an idle state, theprocessor having hardware coordination logic to manage power consumptionof the core based on the command and an idle state status of each of theplurality of cores, wherein the coordination logic is to determinewhether each of the plurality of cores has detected a command requestinga transition to a common state and select a shallowest state among theplurality of cores as the idle state if each of the plurality of coreshas not detected a command requesting a transition to the common state.12. The apparatus of claim 11, wherein the core is to initiate adedicated power saving feature in the core before the core istransitioned to the idle state, and halt execution of the core.
 13. Theapparatus of claim 12, wherein the coordination logic is to determinethat none of the plurality of cores is active, initiate a shared powersaving feature before the execution is halted, prevent external breakevents from reaching the plurality of cores and transition the pluralityof cores to the idle state.
 14. The apparatus of claim 13, wherein thecoordination logic is to issue a signal to a chipset to transition theplurality of cores to the idle state, the signal to be selected from agroup comprising a read transaction, a bus message and a sidebandsignal.
 15. The apparatus of claim 13, wherein the coordination logic isto initiate a process selected from a group comprising reducing aperformance state of the processor, shutting down a shared phase lockedloop or saving an execution context of the processor to initiate theshared power saving feature.
 16. The apparatus of claim 12, wherein thecoordination logic is to determine that one or more of the plurality ofcores is active and determine whether the idle state is associated witha resource that is shared by the plurality of cores.
 17. The apparatusof claim 16, further including a resource that is shared by theplurality of cores, the core to transition itself to a tentative stateuntil each of the plurality of cores has detected a command requesting atransition to the idle state if the idle state is associated with theresource and transition itself to the idle state if the idle state isnot associated with the resource.
 18. The apparatus of claim 12, furtherincluding: a level one cache that is dedicated to the core a level twocache that is shared by the piurality of cores; and a clock that isdedicated to the core, the core to flush the level one cache into thelevel two cache, place the level one cache in a non-snoopable state andgate the clock.
 19. The apparatus of claim 11, wherein the core is todetect a first command that identifies an address and detect a secondcommand that instructs the core to wait in the idle state until theaddress is encountered.
 20. apparatus of claim 11, wherein the core isto receive an input/output (I/O) read transaction that identifies theidle state and translate the I/O read transaction into a second commandthat instructs the core to wait in the idle state until an address isencountered.
 21. A system comprising: a random access memory to storeinstructions; and a processor having a plurality of cores to execute theinstructions, the plurality of cores including a core to detect acommand that requests a transition of the core to an idle state, theprocessor having hardware coordination logic to manage power consumptionof the core based on the command and an idle status of each of theplurality of cores, wherein the coordination logic is to determinewhether each of the plurality of cores has detected a command requestinga transition to a common state and select a shallowest state among theplurality of cores as the idle state if each of the plurality of coreshas not detected a command requesting a transition to the common state.22. The system of claim 21, wherein the core is to initiate a dedicatedpower saving feature in the core before the core is transitioned to theidle state, and halt execution of the core.
 23. The system of claim 22,wherein the coordination logic is to determine that none of theplurality of cores is active, initiate a shared power saving featurebefore the execution is halted, prevent external break events fromreaching the plurality of cores and transition the plurality of cores tothe idle state.
 24. The system of claim 23, further including a chipsetdisposed between the processor and the memory, the coordination logic toissue a signal to the chipset to transition the plurality of cores tothe idle state, the signal to be selected from a group comprising a readtransaction, a bus message and a sideband signal.
 25. The system ofclaim 21, wherein the core is to detect a command that requests atransition of the core to a C-state.
 26. A method comprising: detectinga command at a core of a processor having a plurality of cores, thecommand requesting a transition of the core to a C-state; initiating adedicated power saving feature in the core before transitioning the coreto the C-state; determining whether each of the plurality of the coresis active; if none of the plurality of cores is active, determiningwhether each of the plurality of cores has detected a command requestinga transition to a common state, and selecting a shallowest state amongthe plurality of cores as the C-state if each of the plurality of coreshas not detected a command requesting a transition to the common state;if one or more of the plurality of cores is active, determining whetherthe C-state is associated with a resource that is shared by theplurality of cores; if the C-state is associated with a resource that isshared by the plurality of cores, transitioning the core to a tentativestate until each of the plurality of cores has detected a commandrequesting a transition to the C-state; if the C-state is not associatedwith a resource that is shared by the plurality of cores, transitioningthe core to the C-state; and halting execution of the core.
 27. Themethod of claim 26, wherein initiating the dedicated power savingfeature includes: flushing a level one cache of the core into a leveltwo cache of the processor; placing the level one cache in anon-snoopable state; and gating a dedicated clock of the core.
 28. Themethod of claim 26, wherein if none of the plurality of cores is activethe method further includes: initiating a shared power saving feature,preventing external break events from reaching the plurality of coresand transitioning the plurality of cores to the C-state.
 29. The methodof claim 26, wherein initiating the shared power saving feature includesinitiating a process selected from a group comprising reducing aperformance state of the processor, shutting down a phase locked loopand saving an execution context of the processor.